Package Maintenance and Automation

Dr Randy Johnson

Hood College

2026-04-16

Acknowlegements

Gemini code assist was active during the preparation of these notes, and some autosuggestions were incorporated into the final text.

Principles of Package Maintenance & Sharing

  • Sharing code isn’t just about giving away software
    • Advancing science
    • Scientific reproducibility
    • Building a resilient community
  • Good maintenance spreads knowledge and lowers the entry barrier

The Bus Factor

  • How many team members need to be hit by a bus (or win the lottery and quit) before the project completely stalls?
  • A bus factor of 1 is dangerous

Components of a Good Repository

  • README.md
    • Audience: new users
    • Project title and description
    • Installation steps
    • Basic usage examples

Components of a Good Repository

  • CONTRIBUTING.md
    • Audience: Advanced users who are interested in contributing
    • Explain how to set up the dev environment locally
    • Code style guidelines
    • PR process

Components of a Good Repository

  • LICENSE
    • Without a license, code is completely copyrighted by default and technically illegal to use
    • MIT: do whatever, don’t sue me
    • GPL: if you modify and distribute, you must share your source

Components of a Good Repository

  • CHANGELOG.md
    • Audience: existing users
    • Don’t rely purely on commit history
    • Write human-readable logs organized by Added, Changed, Deprecated, Removed, Fixed, and Security

Versioning

  • MAJOR.MINOR.PATCH (e.g. v2.4.1)

  • PATCH (2.4.1 -> 2.4.2)

    • Bug fixes
    • If users update, nothing breaks

Versioning

  • MINOR (2.4.1 -> 2.5.0)
    • New features
    • Fully backward compatible
  • MAJOR (2.4.1 -> 3.0.0)
    • Breaking changes
    • Users will need to update their own code to use this new version

GitHub Tools for Maintenance

  • Issues
  • Pull Requests (PR)
  • Releases / Packages

Issue Tracking & Management

  • Writing good bug reports
    • Reproduction steps
    • Expected behavior
    • Actual Behavior
    • “It’s broken” is not a good report issue

Issue Tracking & Management

  • Labels & Milestones
    • Issue labels (good first issue for attracting beginners)
    • Milestones group issues together for a specific target release (e.g. “Version 2.0 Launch”)
    • Issue templates can be a useful tool for getting more helpful bug reports from users

Issue Tracking & Management

Example issue template on GiHub

Pull Requests (PR)

  • For more than contributing to open source projects

  • Branching

    • Don’t push directly to main
    • Example branches: fix/login-bug or feature/dark-mode

Pull Requests (PR)

  • Code Reviews
    • Code review is a conversation, not an attack
  • Automation
    • Example: writing Closes #42 in a PR description automatically closes Issue #42 when the PR is merged

Releases & Packages

  • Release
    • A GitHub Release is a wrapper around a Git Tag
    • It allows you to attach compiled binaries or release notes
  • Package
    • Source code (GitHub repo) is different from an installable package (e.g. PYPI, NPM)
    • GitHub Packages can act as a private registry (e.g. to use with npm)

Automating Maintenance with GitHub Actions

  • What is CI/CD?
  • Common maintenance workflows

Continuous Integration

  • Automatically running tests and linters every time code is pushed
  • Tests are included for each feature
  • Boundary conditions are covered
  • Each time a bug is fixed, add a new test to make sure it doesn’t come up again
  • “Does this code break my package?”

Continuous Deployment

  • Automatically publishing or deploying the code once CI passes
  • Minor releases are frequent

GitHub Actions

Automation of tasks on GitHub

  • Workflows are defined in YAML files inside .github/workflows/

  • Events/Triggers

    • on: push
    • on: pull_request
    • on: schedule for cron jobs

GitHub Actions

  • Runners
    • Virtual machines hosted by GitHub that execute your
    • Many different architectures and operating systems are available
  • Jobs & Steps
    • A workflow has jobs which run in parallel unless there are dependencies
    • Jobs have steps which run sequentially

Common Maintenance Workflows

  • Matrix Builds

    • Running the exact same test suite on Ubuntu, Windows, and macOS simultaneously using a matrix strategy to catch OS-specific bugs
  • Compilation of code when changes are pushed (e.g. for CD)

  • Dependabot

    • GitHub’s native security screener to automatically open PRs when your dependencies have security vulnerabilities or out-of-date versions

Docker

Dependencies can be a pain to manage, especially for less technical collaborators

  • Global system dependencies
  • Mismatched Python/Node versions
  • OS differences

Virtual environments (like venv or npm) help, but they don’t capture OS-level dependencies (like C++ compilers or database drivers)

Containers

  • Before shipping containers, loading a cargo ship took days of packing weirdly shaped items
  • Standard containers mean standard cranes and standard ships
  • Docker is standard packaging for code

Containers vs VMs

  • VMs emulate the whole hardware and OS (heavy)
  • Containers share the host OS kernel and only isolate the app and its libraries (lightweight and fast)

Docker Basics

  • Images vs. Containers
    • An Image is the recipe/blueprint
    • A Container is the running instance of that recipe

Sample Dockerfile

FROM python:3.10-slim # Starts with a tiny Linux environment pre-loaded with Python 3.10
RUN apt-get update && \
    apt-get install -y samtools # install samtools

WORKDIR /app # Creates a folder called `/app` inside the container and moves into it
COPY requirements.txt . # Copy your Python dependency definitions into the image

RUN pip install -r requirements.txt # Installs Python packages (e.g. biopython, pandas) during the build process

COPY . . # Copy your actual analysis scripts into the image

CMD ["python", "analyze_sequences.py"] # The default command that executes when the container starts

.dockerignore

  • Exclude node_modules, .git, data and environment variable files (.env)
  • Helps keep images small
  • Helps avoid security issues

Benefits for maintainers

Example: Reviewers can pull a PR, type docker compose up, and test a complex app with a database instantly, without installing a database or other dependencies on their local machine